1 Sample Metadata

treatment age_group patient_id sample Sequence
0 2 3 160008699_3_0_S5 1
1 2 3 160008699_3_8_S6 2
0 2 4 290001824_4_0_S7 3
1 2 4 290001824_4_8_S8 4
0 1 17 330001842_17_0_S31 5
1 1 17 330001842_17_8_S32 6
0 0 5 470009458_5_0_S9 7
1 0 5 470009458_5_4_S10 8
0 1 13 660009823_13_0_S25 9
1 1 13 660009823_13_8_S26 10
0 0 11 770004766_11_0_S21 11
1 0 11 770004766_11_8_S22 12
0 1 2 830001304_2_0_S3 13
1 1 2 830001304_2_4_S4 14
0 2 12 830002078_12_0_S23 15
1 2 12 830002078_12_8_S24 16
0 2 9 880001252_9_0_S17 17
1 2 9 880001252_9_8_S18 18
0 0 8 940004357_8_0_S15 19
1 0 8 940004357_8_8_S16 20
0 1 7 970002731_7_0_S13 21
1 1 7 970002731_7_4_S14 22
0 0 10 980007758_10_0_S19 23
1 0 10 980007758_10_8_S20 24

2 WGCNA result

Soft threshold = 16

soft threshold = 16

.

Modules = 29

29 modules in total

.

Positive modules Spearman correlation (p-value)
lightgreen (152 genes) 0.14 (0.1)
Negative modules Spearman correlation (p-value)
darkred (63 genes) -0.12 (0.2)
midnightblue (303 genes) -0.1 (0.2)

3 Run LASSO on treatment-positive modules (Module lightgreen: 152 genes)

  • Alpha = 1

  • Nested cross validation

    • outer loop method: leave-one-out
    • inner loop method: leave-one-out
## Tuned lambda value:
##  0.04336461
## 
## Call:  cv.glmnet(x = x, y = y, weights = ..2, foldid = foldid, alpha = tail(alphaSet,      1), family = ..1, penalty.factor = ..3) 
## 
## Measure: Binomial Deviance 
## 
##      Lambda Index Measure      SE Nonzero
## min 0.04336    41   1.160 0.22381      10
## 1se 0.20128     8   1.376 0.05708       1
## Non-zero Coefficients:
##  ENSG00000152894 ENSG00000084072 ENSG00000058091 ENSG00000112232 ENSG00000173898 ENSG00000101057 ENSG00000026559 ENSG00000168916 ENSG00000134532 ENSG00000166833

3.1 List of genes with non-zero coefficients (10 Genes)

ensembl_gene_id external_gene_name
ENSG00000026559 KCNG1
ENSG00000058091 CDK14
ENSG00000084072 PPIE
ENSG00000101057 MYBL2
ENSG00000112232 KHDRBS2
ENSG00000134532 SOX5
ENSG00000152894 PTPRK
ENSG00000166833 NAV2
ENSG00000168916 ZNF608
ENSG00000173898 SPTBN2

3.2 AUC

3.3 Accuracy

##          Reference
## Predicted  0  1
##         0 10  6
##         1  2  6
##               AUC          Accuracy Balanced accuracy 
##         0.6597222         0.6666667         0.6666667

3.4 TPM (pre vs. post)

sample 160008699_3_0_S5 290001824_4_0_S7 330001842_17_0_S31 470009458_5_0_S9 660009823_13_0_S25 770004766_11_0_S21 830001304_2_0_S3 830002078_12_0_S23 880001252_9_0_S17 940004357_8_0_S15 970002731_7_0_S13 980007758_10_0_S19 160008699_3_8_S6 290001824_4_8_S8 330001842_17_8_S32 470009458_5_4_S10 660009823_13_8_S26 770004766_11_8_S22 830001304_2_4_S4 830002078_12_8_S24 880001252_9_8_S18 940004357_8_8_S16 970002731_7_4_S14 980007758_10_8_S20
KCNG1 0.65 0.89 0.51 0.55 0.22 0.13 0.14 0.32 0.29 0.09 0.22 1.01 0.94 1.28 0.93 1.03 0.13 0.26 0.37 0.30 0.48 0.60 1.09 1.19
CDK14 6.49 7.49 3.21 4.61 3.87 3.58 7.03 2.71 7.50 3.65 5.13 5.42 5.21 8.77 5.54 5.35 8.58 6.32 10.43 2.10 5.82 6.42 6.11 6.72
PPIE 21.15 17.39 12.59 11.87 12.76 7.84 18.19 10.75 18.17 13.71 10.27 16.09 16.44 17.77 10.86 12.52 5.29 9.87 14.59 7.85 11.21 8.37 9.80 12.64
MYBL2 1.22 0.89 1.86 1.61 2.09 0.80 1.46 1.09 0.95 0.77 3.31 4.64 1.04 2.54 1.99 1.90 0.76 1.07 0.99 0.16 0.77 0.85 0.28 4.22
KHDRBS2 0.52 0.31 0.38 0.25 0.30 0.10 0.37 0.13 0.48 0.16 0.19 0.39 0.46 0.48 0.41 0.27 0.20 0.15 0.34 0.71 0.36 0.40 0.75 0.58
SOX5 0.04 1.80 0.63 1.62 0.89 0.11 2.10 0.43 1.28 0.42 0.30 0.58 0.17 0.37 0.36 2.58 0.19 0.34 0.97 0.11 0.53 0.71 0.28 0.70
PTPRK 0.74 0.58 1.01 1.07 0.91 0.44 1.95 0.60 1.38 0.93 0.51 1.50 1.37 2.68 2.35 3.05 1.00 0.76 2.48 0.59 1.72 1.78 1.39 2.82
NAV2 0.71 0.47 0.35 0.88 0.73 0.06 1.10 0.51 0.29 1.05 0.46 0.31 0.63 0.87 0.23 0.27 0.27 0.05 0.65 0.26 0.07 1.44 0.25 0.28
ZNF608 0.68 1.06 0.76 1.04 0.39 0.14 1.20 0.76 0.43 0.21 1.24 0.91 0.40 0.86 0.44 0.98 0.50 0.46 0.58 0.31 0.49 0.27 0.17 1.19
SPTBN2 0.65 0.36 0.24 0.27 0.20 0.01 0.41 0.70 1.61 0.07 0.11 0.44 0.05 0.19 0.09 0.21 0.28 0.10 0.13 0.03 0.19 0.36 0.69 0.06
treatment pre pre pre pre pre pre pre pre pre pre pre pre post post post post post post post post post post post post
patient_id 3 4 17 5 13 11 2 12 9 8 7 10 3 4 17 5 13 11 2 12 9 8 7 10
Download posModule(152gene)_TPM_16.zip

3.5 Heatmap (Pre-treatment vs Post-treatment)

3.6 Heatmap for Log-FoldChange for each patient

4 Run LASSO on treatment-negative modules (Module darkred: 63 genes)

  • Alpha = 1

  • Nested cross validation

    • outer loop method: leave-one-out
    • inner loop method: k-fold; leave-one-out
## Tuned lambda value:
##  0.1392525
## 
## Call:  cv.glmnet(x = x, y = y, weights = ..2, foldid = foldid, alpha = tail(alphaSet,      1), family = ..1, penalty.factor = ..3) 
## 
## Measure: Binomial Deviance 
## 
##     Lambda Index Measure      SE Nonzero
## min 0.1393    11   1.505 0.10086       3
## 1se 0.2217     1   1.506 0.01945       0
## Non-zero Coefficients:
##  ENSG00000169519 ENSG00000279982 ENSG00000272502

4.1 AUC

4.2 Accuracy

##    
##      0  1
##   0 12  0
##   1  0 12
##       Accuracy          Kappa  AccuracyLower  AccuracyUpper   AccuracyNull 
##   1.000000e+00   1.000000e+00   8.575264e-01   1.000000e+00   5.000000e-01 
## AccuracyPValue  McnemarPValue 
##   5.960464e-08            NaN

4.3 List of genes with non-zero coefficients (3 genes)

ensembl_gene_id external_gene_name
ENSG00000169519 METTL15
ENSG00000272502 ENSG00000272502
ENSG00000279982 ENSG00000279982

4.4 TPM (pre vs. post)

sample 160008699_3_0_S5 290001824_4_0_S7 330001842_17_0_S31 470009458_5_0_S9 660009823_13_0_S25 770004766_11_0_S21 830001304_2_0_S3 830002078_12_0_S23 880001252_9_0_S17 940004357_8_0_S15 970002731_7_0_S13 980007758_10_0_S19 160008699_3_8_S6 290001824_4_8_S8 330001842_17_8_S32 470009458_5_4_S10 660009823_13_8_S26 770004766_11_8_S22 830001304_2_4_S4 830002078_12_8_S24 880001252_9_8_S18 940004357_8_8_S16 970002731_7_4_S14 980007758_10_8_S20
METTL15 5.35 3.64 1.91 1.98 3.71 1.05 4.07 2.33 3.53 1.91 2.70 2.80 2.49 3.16 2.29 1.63 0.76 1.83 2.69 2.01 2.06 1.64 1.61 2.05
ENSG00000272502 0.69 2.28 0.76 1.03 0.74 0.37 1.43 0.70 1.53 0.76 0.40 1.29 1.36 1.16 0.26 0.45 0.38 0.68 0.58 0.65 0.54 0.45 0.45 0.75
ENSG00000279982 0.17 0.25 0.17 0.11 0.58 0.06 0.33 0.28 0.38 0.18 0.20 0.28 0.13 0.29 0.16 0.12 0.06 0.05 0.21 0.12 0.15 0.23 0.10 0.14
treatment pre pre pre pre pre pre pre pre pre pre pre pre post post post post post post post post post post post post
patient_id 3 4 17 5 13 11 2 12 9 8 7 10 3 4 17 5 13 11 2 12 9 8 7 10
Download negModule(63gene)_TPM_16.zip

4.5 Heatmap (Pre-treatment vs Post-treatment)

4.6 Heatmap for Log-FoldChange for each patient

5 Run LASSO on treatment-negative module (Module midnightblue: 303 genes)

  • Alpha = 1

  • Nested cross validation

    • outer loop method: leave-one-out
    • inner loop method: leave-one-out
## Tuned lambda value:
##  0.1253536
## 
## Call:  cv.glmnet(x = x, y = y, weights = ..2, foldid = foldid, alpha = tail(alphaSet,      1), family = ..1, penalty.factor = ..3) 
## 
## Measure: Binomial Deviance 
## 
##     Lambda Index Measure      SE Nonzero
## min 0.1253    14   1.443 0.10450       2
## 1se 0.2295     1   1.495 0.01162       0
## Non-zero Coefficients:
##  ENSG00000186073 ENSG00000164120

5.1 AUC

5.2 Accuracy

##    
##      0  1
##   0 12  0
##   1  0 12
##       Accuracy          Kappa  AccuracyLower  AccuracyUpper   AccuracyNull 
##   1.000000e+00   1.000000e+00   8.575264e-01   1.000000e+00   5.000000e-01 
## AccuracyPValue  McnemarPValue 
##   5.960464e-08            NaN

5.3 List of genes with non-zero coefficients (2 gene)

ensembl_gene_id external_gene_name
ENSG00000164120 HPGD
ENSG00000186073 CDIN1

5.4 TPM (pre vs. post)

sample 160008699_3_0_S5 290001824_4_0_S7 330001842_17_0_S31 470009458_5_0_S9 660009823_13_0_S25 770004766_11_0_S21 830001304_2_0_S3 830002078_12_0_S23 880001252_9_0_S17 940004357_8_0_S15 970002731_7_0_S13 980007758_10_0_S19 160008699_3_8_S6 290001824_4_8_S8 330001842_17_8_S32 470009458_5_4_S10 660009823_13_8_S26 770004766_11_8_S22 830001304_2_4_S4 830002078_12_8_S24 880001252_9_8_S18 940004357_8_8_S16 970002731_7_4_S14 980007758_10_8_S20
HPGD 4.38 3.06 0.78 1.14 1.61 2.61 4.23 2.18 4.24 2.07 1.90 3.77 1.95 3.31 0.82 1.07 1.84 2.90 2.15 1.00 1.47 1.06 0.91 2.23
CDIN1 1.14 2.48 1.36 1.04 1.94 0.74 2.13 1.24 1.95 1.05 1.43 1.74 1.35 1.72 1.58 0.96 0.58 0.52 1.61 0.81 0.77 0.72 0.83 1.15
treatment pre pre pre pre pre pre pre pre pre pre pre pre post post post post post post post post post post post post
patient_id 3 4 17 5 13 11 2 12 9 8 7 10 3 4 17 5 13 11 2 12 9 8 7 10
Download negModule(303gene)_TPM_16.zip

5.5 Heatmap (Pre-treatment vs Post-treatment)

5.6 Heatmap for Log-FoldChange for each patient

6 Final model

6.1 Final Lasso nested cv (Total 15 genes)

## Tuned lambda value:
##  0.02060171
## 
## Call:  cv.glmnet(x = x, y = y, weights = ..2, foldid = foldid, alpha = tail(alphaSet,      1), family = ..1, penalty.factor = ..3) 
## 
## Measure: Binomial Deviance 
## 
##      Lambda Index Measure     SE Nonzero
## min 0.02060    29  0.8468 0.2593      11
## 1se 0.09128    13  1.1030 0.1491       9
## Non-zero Coefficients:
##  ENSG00000186073 ENSG00000152894 ENSG00000058091 ENSG00000112232 ENSG00000173898 ENSG00000168916 ENSG00000164120 ENSG00000166833 ENSG00000169519 ENSG00000084072 ENSG00000101057

6.1.1 AUC

6.1.2 Accuracy of nested cv

##          Reference
## Predicted 0 1
##         0 9 3
##         1 3 9
##               AUC          Accuracy Balanced accuracy 
##             0.875             0.750             0.750

6.2 Final model (11 genes)

## 
## Call:
## glm(formula = formula_str, family = binomial, data = data.frame(final_model_matrix))
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)
## (Intercept)         -3.072 355569.691       0        1
## ENSG00000186073    -13.996 159559.523       0        1
## ENSG00000152894      7.460 155528.692       0        1
## ENSG00000058091     34.317 129080.505       0        1
## ENSG00000112232     11.136 596107.470       0        1
## ENSG00000173898    -32.154 390710.815       0        1
## ENSG00000168916     -5.833 276813.334       0        1
## ENSG00000164120    -20.629 866630.487       0        1
## ENSG00000166833     -8.239 173876.233       0        1
## ENSG00000169519     12.471 946415.656       0        1
## ENSG00000084072     -1.015 278030.257       0        1
## ENSG00000101057     -6.527 594384.931       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3.3271e+01  on 23  degrees of freedom
## Residual deviance: 4.1900e-10  on 12  degrees of freedom
## AIC: 24
## 
## Number of Fisher Scoring iterations: 25

6.2.1 Check correlation between each feature

6.3 Check complete separation (perfect prediction)

## Implementation: ROI | Solver: lpsolve 
## Separation: TRUE 
## Existence of maximum likelihood estimates
##     (Intercept) ENSG00000186073 ENSG00000152894 ENSG00000058091 ENSG00000112232 
##            -Inf            -Inf             Inf             Inf             Inf 
## ENSG00000173898 ENSG00000168916 ENSG00000164120 ENSG00000166833 ENSG00000169519 
##            -Inf            -Inf            -Inf             Inf            -Inf 
## ENSG00000084072 ENSG00000101057 
##            -Inf             Inf 
## 0: finite value, Inf: infinity, -Inf: -infinity

6.4 Method 1: Bayes

## bayesglm(formula = formula_str, family = binomial(link = "logit"), 
##     data = as.data.frame(final_model_matrix))
##                 coef.est coef.se
## (Intercept)      0.06     0.82  
## ENSG00000186073 -0.84     0.87  
## ENSG00000152894  1.26     0.92  
## ENSG00000058091  1.30     0.91  
## ENSG00000112232  1.07     0.74  
## ENSG00000173898 -0.77     0.79  
## ENSG00000168916 -0.43     0.75  
## ENSG00000164120 -0.38     0.78  
## ENSG00000166833 -0.37     0.65  
## ENSG00000169519 -0.31     0.87  
## ENSG00000084072 -0.29     0.84  
## ENSG00000101057 -0.35     0.81  
## ---
## n = 24, k = 12
## residual deviance = 4.8, null deviance = 33.3 (difference = 28.5)
## 
## Call:  bayesglm(formula = formula_str, family = binomial(link = "logit"), 
##     data = as.data.frame(final_model_matrix), method = "detect_separation")
## 
## Coefficients:
##     (Intercept)  ENSG00000186073  ENSG00000152894  ENSG00000058091  
##         0.06073         -0.83927          1.26230          1.30375  
## ENSG00000112232  ENSG00000173898  ENSG00000168916  ENSG00000164120  
##         1.06638         -0.77216         -0.43275         -0.37849  
## ENSG00000166833  ENSG00000169519  ENSG00000084072  ENSG00000101057  
##        -0.36640         -0.30720         -0.29201         -0.35347  
## 
## Degrees of Freedom: 23 Total (i.e. Null);  12 Residual
## Null Deviance:       33.27 
## Residual Deviance: 4.77  AIC: 28.77

6.4.1 AUC:bayes

6.5 Method 2: Firth’s Bias-Reduced Logistic Regression

Firth’s bias reduction method, equivalent to penalization of the log-likelihood

## logistf(formula = formula_str, data = as.data.frame(final_model_matrix))
## 
## Model fitted by Penalized ML
## Coefficients:
##                       coef  se(coef) lower 0.95 upper 0.95      Chisq         p
## (Intercept)      0.1467598 0.4644579 -1.1980458  2.3826018 0.06886319 0.7929992
## ENSG00000186073 -0.3129507 0.6202474 -2.3955619  1.2965489 0.20811403 0.6482496
## ENSG00000152894  0.6736023 0.7548255 -1.3659300  3.7205358 0.63266700 0.4263787
## ENSG00000058091  1.1690416 0.8029840 -0.6709524  4.4023020 1.64697932 0.1993706
## ENSG00000112232  0.6541397 0.5899478 -1.5437707  3.0661566 0.97581356 0.3232346
## ENSG00000173898 -0.6883909 0.5775987 -5.4198388  0.4593585 1.34411190 0.2463101
## ENSG00000168916 -0.4959154 0.6773029 -2.9708523  1.3429128 0.37323096 0.5412484
## ENSG00000164120 -0.3078037 0.8503156 -3.5850636  1.9199496 0.10363021 0.7475160
## ENSG00000166833 -0.4104273 0.4879591 -2.0885087  1.2699108 0.54638992 0.4597965
## ENSG00000169519  0.1244505 1.0006652 -4.1150175  2.5163733 0.01109416 0.9161149
## ENSG00000084072 -0.1642066 1.0510126 -2.6306083  3.6718098 0.01928157 0.8895623
## ENSG00000101057 -0.2105312 0.6647076 -3.3545827  1.8218956 0.07317914 0.7867630
##                 method
## (Intercept)          2
## ENSG00000186073      2
## ENSG00000152894      2
## ENSG00000058091      2
## ENSG00000112232      2
## ENSG00000173898      2
## ENSG00000168916      2
## ENSG00000164120      2
## ENSG00000166833      2
## ENSG00000169519      2
## ENSG00000084072      2
## ENSG00000101057      2
## 
## Method: 1-Wald, 2-Profile penalized log-likelihood, 3-None
## 
## Likelihood ratio test=16.48283 on 11 df, p=0.1241313, n=24
## Wald test = 10.86186 on 11 df, p = 0.4549077
## logistf(formula = formula_str, data = as.data.frame(final_model_matrix), 
##     method = "detect_separation")
## Model fitted by Penalized ML
## Confidence intervals and p-values by Profile Likelihood 
## 
## Coefficients:
##     (Intercept) ENSG00000186073 ENSG00000152894 ENSG00000058091 ENSG00000112232 
##       0.1467598      -0.3129507       0.6736023       1.1690416       0.6541397 
## ENSG00000173898 ENSG00000168916 ENSG00000164120 ENSG00000166833 ENSG00000169519 
##      -0.6883909      -0.4959154      -0.3078037      -0.4104273       0.1244505 
## ENSG00000084072 ENSG00000101057 
##      -0.1642066      -0.2105312 
## 
## Likelihood ratio test=16.48283 on 11 df, p=0.1241313, n=24
## [1] "AUC (test): 1"

## [1] "Accuracy (test): 1"

Check direction of each gene in two models, all the same

7 Final genes list (Total 11 genes)

Modules (size) Module correlation to treatment Genes selected by lasso
lightgreen (152 genes) Positive 8
darkred (63 genes) Negative 1
midnightblue (303 genes) Negative 2
ensembl_gene_id external_gene_name
1 ENSG00000058091 CDK14
2 ENSG00000084072 PPIE
3 ENSG00000101057 MYBL2
4 ENSG00000112232 KHDRBS2
5 ENSG00000152894 PTPRK
6 ENSG00000166833 NAV2
7 ENSG00000168916 ZNF608
8 ENSG00000173898 SPTBN2
9 ENSG00000169519 METTL15
10 ENSG00000164120 HPGD
11 ENSG00000186073 CDIN1

7.1 Heatmap (Pre-treatment vs Post-treatment)

7.2 Heatmap for Log-FoldChange for each patient

8 Pathway Analysis ORA

Reactome

.

KEGG

.